explicit calibration
X-CAL: Explicit Calibration for Survival Analysis
When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 2020] which computes the squared difference between the observed and predicted number of events within different time intervals. Classically, calibration is addressed in post-training analysis. We develop explicit calibration (X-CAL), which turns D-CALIBRATION into a differentiable objective that can be used in survival modeling alongside maximum likelihood estimation and other objectives. X-CAL allows us to directly optimize calibration and strike a desired trade-off between predictive power and calibration. In our experiments, we fit a variety of shallow and deep models on simulated data, a survival dataset based on MNIST, on length-of-stay prediction using MIMIC-III data, and on brain cancer data from The Cancer Genome Atlas. We show that the models we study can be miscalibrated. We give experimental evidence on these datasets that X-CAL improves D-CALIBRATION without a large decrease in concordance or likelihood.
X-CAL: Explicit Calibration for Survival Analysis
When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 2020] which computes the squared difference between the observed and predicted number of events within different time intervals. Classically, calibration is addressed in post-training analysis. We develop explicit calibration (X-CAL), which turns D-CALIBRATION into a differentiable objective that can be used in survival modeling alongside maximum likelihood estimation and other objectives. X-CAL allows us to directly optimize calibration and strike a desired trade-off between predictive power and calibration.
Review for NeurIPS paper: X-CAL: Explicit Calibration for Survival Analysis
Weaknesses: Any kind of predictive model, and especially deep neural networks, will tend to overfit to the training set, generally causing predictions on a separate test set to be too extreme (shrinkage, or calibration slope of less than 1). The authors' X-cal procedure ensures good calibration on the training set. But that could result in disappointing calibration when applied to the test set. It seems to me that one would want a procedure to maximize calibration on a validation set, not the training set. That would then lead to good calibration on the separate test set.
Review for NeurIPS paper: X-CAL: Explicit Calibration for Survival Analysis
For survival analysis, where calibrated models obviously are important, this paper introduces a differentiable plug-and-play regularizer which allows optimizing calibration, and choosing a trade-off between prediction accuracy and calibration. This was considered important and the first of its kind. The paper was intensively discussed among the reviewers. In particular, the reviewers argued whether the paper has shown convincingly enough that the method is necessary, because earlier results indicate other methods may produce calibrated results without the added regularizer (Haider et al. 2018). However, the results the authors point at in their response indicate a positive result, which the authors clarified in their anonymous email. The authors are strongly requested to include the additional results in their paper, as this was the bottleneck issue in recommending acceptance, and to take into account the other important points the reviewers raised.
X-CAL: Explicit Calibration for Survival Analysis
When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 2020] which computes the squared difference between the observed and predicted number of events within different time intervals. Classically, calibration is addressed in post-training analysis. We develop explicit calibration (X-CAL), which turns D-CALIBRATION into a differentiable objective that can be used in survival modeling alongside maximum likelihood estimation and other objectives. X-CAL allows us to directly optimize calibration and strike a desired trade-off between predictive power and calibration.